-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51030][CORE][TESTS] Add a check before Utils.deleteRecursively(tempDir) to ensure tempDir won't be cleaned up by the ShutdownHook in afterEach of LocalRootDirsTest
#49723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.github/workflows/build_and_test.yml
Outdated
| node --experimental-vm-modules node_modules/.bin/jest | ||
| maven-test: | ||
| name: "Run Maven Test on macos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
hasRootAsShutdownDeleteDir check for tempDir in the afterEach of LocalRootDirsTest.hasShutdownDeleteDir check for tempDir in the afterEach of LocalRootDirsTest.
hasShutdownDeleteDir check for tempDir in the afterEach of LocalRootDirsTest.tempDir before manual cleanup in the afterEach method of LocalRootDirsTest
| override def afterEach(): Unit = { | ||
| try { | ||
| Utils.deleteRecursively(tempDir) | ||
| if (!ShutdownHookManager.hasShutdownDeleteDir(tempDir) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| tempDir = Utils.createTempDir(namePrefix = "local") |
In the beforeEach method, tempDir = Utils.createTempDir(namePrefix = "local") invokes ShutdownHookManager.registerShutdownDeleteDir to register tempDir for deletion via a shutdown hook. Therefore, if tempDir remains unchanged during the test case, manual cleanup is unnecessary.
However, given that tempDir is defined as protected var tempDir, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up tempDir:
tempDirhas not been registered for cleanup via a shutdown hook.tempDiris not a subpath of any path that has been registered for cleanup via a shutdown hook.
This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the tempDir concurrently. It seems that this issue is more prevalent on macOS, so let's conduct several more rounds of testing to ensure its effectiveness.
tempDir before manual cleanup in the afterEach method of LocalRootDirsTesttempDir to ensure that it won't be cleaned up by the ShutdownHook in the afterEach of LocalRootDirsTest
tempDir to ensure that it won't be cleaned up by the ShutdownHook in the afterEach of LocalRootDirsTestUtils.deleteRecursively(tempDir) to ensure tempDir won't be cleaned up by the ShutdownHook in the afterEach of LocalRootDirsTest
Utils.deleteRecursively(tempDir) to ensure tempDir won't be cleaned up by the ShutdownHook in the afterEach of LocalRootDirsTestUtils.deleteRecursively(tempDir) to ensure tempDir won't be cleaned up by the ShutdownHook in afterEach of LocalRootDirsTest
Utils.deleteRecursively(tempDir) to ensure tempDir won't be cleaned up by the ShutdownHook in afterEach of LocalRootDirsTestUtils.deleteRecursively(tempDir) to ensure tempDir won't be cleaned up by the ShutdownHook in afterEach of LocalRootDirsTest
…y(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` ### What changes were proposed in this pull request? This pr add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` to minimize the risk of test failures caused by race conditions in multi-threaded deletion of the `tempdir`. ### Why are the changes needed? In the daily tests on macOS, you may see test failures similar to the following: - https://github.com/apache/spark/actions/runs/13014395379/job/36325277534 ``` SslExternalShuffleServiceSuite: ... - SPARK-38640: memory only blocks can unpersist using shuffle service cache fetching SUITE ABORTED - SslExternalShuffleServiceSuite: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 java.io.IOException: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 at org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:201) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:135) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) ... ``` This appears to be an issue caused by multi-threaded deletion of the `tempdir`. https://github.com/apache/spark/blob/42898c62b4c197e88b842eaa840986cb70d93a9c/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala#L42 Refer to the code above, in `LocalRootDirsTest#beforeEach` method, `tempDir = Utils.createTempDir(namePrefix = "local")` invokes `ShutdownHookManager.registerShutdownDeleteDir` to register `tempDir` for deletion via a shutdown hook. Therefore, if `tempDir` remains unchanged during the test case, manual cleanup is unnecessary in `LocalRootDirsTest#afterEach` method. However, given that `tempDir` is defined as `protected var tempDir`, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up `tempDir`: 1. `tempDir` has not been registered for cleanup via a shutdown hook. 2. `tempDir` is not a subpath of any path that has been registered for cleanup via a shutdown hook. This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the `tempDir` concurrently. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #49723 from LuciferYang/LocalRootDirsTest-hasRootAsShutdownDeleteDir. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 19fbeaa) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/4.0. Thank you, @LuciferYang . |
|
Thank you @dongjoon-hyun |
…y(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` ### What changes were proposed in this pull request? This pr add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` to minimize the risk of test failures caused by race conditions in multi-threaded deletion of the `tempdir`. ### Why are the changes needed? In the daily tests on macOS, you may see test failures similar to the following: - https://github.com/apache/spark/actions/runs/13014395379/job/36325277534 ``` SslExternalShuffleServiceSuite: ... - SPARK-38640: memory only blocks can unpersist using shuffle service cache fetching SUITE ABORTED - SslExternalShuffleServiceSuite: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 java.io.IOException: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 at org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:201) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:135) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) ... ``` This appears to be an issue caused by multi-threaded deletion of the `tempdir`. https://github.com/apache/spark/blob/f859317f6262679adaa0d4a435fe424deb79a09e/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala#L42 Refer to the code above, in `LocalRootDirsTest#beforeEach` method, `tempDir = Utils.createTempDir(namePrefix = "local")` invokes `ShutdownHookManager.registerShutdownDeleteDir` to register `tempDir` for deletion via a shutdown hook. Therefore, if `tempDir` remains unchanged during the test case, manual cleanup is unnecessary in `LocalRootDirsTest#afterEach` method. However, given that `tempDir` is defined as `protected var tempDir`, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up `tempDir`: 1. `tempDir` has not been registered for cleanup via a shutdown hook. 2. `tempDir` is not a subpath of any path that has been registered for cleanup via a shutdown hook. This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the `tempDir` concurrently. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49723 from LuciferYang/LocalRootDirsTest-hasRootAsShutdownDeleteDir. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1990d3b) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This pr add a check before
Utils.deleteRecursively(tempDir)to ensuretempDirwon't be cleaned up by the ShutdownHook inafterEachofLocalRootDirsTestto minimize the risk of test failures caused by race conditions in multi-threaded deletion of thetempdir.Why are the changes needed?
In the daily tests on macOS, you may see test failures similar to the following:
This appears to be an issue caused by multi-threaded deletion of the
tempdir.spark/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala
Line 42 in 42898c6
Refer to the code above, in
LocalRootDirsTest#beforeEachmethod,tempDir = Utils.createTempDir(namePrefix = "local")invokesShutdownHookManager.registerShutdownDeleteDirto registertempDirfor deletion via a shutdown hook. Therefore, iftempDirremains unchanged during the test case, manual cleanup is unnecessary inLocalRootDirsTest#afterEachmethod.However, given that
tempDiris defined asprotected var tempDir, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning uptempDir:tempDirhas not been registered for cleanup via a shutdown hook.tempDiris not a subpath of any path that has been registered for cleanup via a shutdown hook.This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the
tempDirconcurrently.Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No